A Transitive Model for Extracting Translation Equivalents of Web Queries through Anchor Text Mining
نویسندگان
چکیده
One of the existing difficulties of cross-language information retrieval (CLIR) and Web search is the lack of appropriate translations of new terminology and proper names. Different from conventional approaches, in our previous research we developed an approach for exploiting Web anchor texts as live bilingual corpora and reducing the existing difficulties of query term translation. Although Web anchor texts, undoubtedly, are very valuable multilingual and wide-scoped hypertext resources, not every particular pair of languages contains sufficient anchor texts in the Web to extract corresponding translations in the language pair. For more generalized applications, in this paper we extend our previous approach by adding a phase of transitive (indirect) translation via an intermediate (third) language, and propose a transitive model to further exploit anchor-text mining in term translation extraction applications. Preliminary experimental results show that many query translations which cannot be obtained using the previous approach can be extracted with the improved approach.
منابع مشابه
Towards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries
This paper proposes an efficient client-server-based query translation approach to allowing more feasible implementation of cross-language information retrieval (CLIR) services in digital library (DL) systems. A centralized query translation server is constructed to process the translation requests of cross-lingual queries from connected DL systems. To extract translations not covered by standa...
متن کاملLiveTrans: Translation Suggestion for Cross-Language Web Search from Web Anchor Texts and Search Results
In this paper we will present a system, called LiveTrans, which can generate translation suggestions for given user queries and provide an English-Chinese cross-language search service for the retrieval of both Web pages and images. The system effectively utilizes two kinds of Web resources: anchor texts and search results. The developed anchor-text-based and search-result-based methods are com...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملCreating Multilingual Translation Lexicons with Regional Variations Using Web Corpora
The purpose of this paper is to automatically create multilingual translation lexicons with regional variations. We propose a transitive translation approach to determine translation variations across languages that have insufficient corpora for translation via the mining of bilingual search-result pages and clues of geographic information obtained from Web search engines. The experimental resu...
متن کاملMining Parenthetical Translations for Polish-English Lexica
Documents written in languages other than English sometimes include parenthetical English translations, usually for technical and scienti c terminology. Techniques had been developed for extracting such translations (as well as transliterations) from large Chinese text corpora. This paper presents methods for mining parenthetical translation in Polish texts. The main di erence between translati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002